{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f20600ce-d57a-446e-b033-3aadeec39c1b",
   "metadata": {},
   "source": [
    "# LlamaParse with GPT-4o\n",
    "\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/parse/test_tesla_impact_report/test_gpt4o.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "Status:\n",
    "| Last Executed | Version | State      |\n",
    "|---------------|---------|------------|\n",
    "| Prior to Feb-2025   | N/A  | Deprecated |\n",
    "\n",
    "GPT-4o is a [fully multimodal model by OpenAI](https://openai.com/index/hello-gpt-4o/) released in May 2024. It matches GPT-4 Turbo performance in text and code, and has significantly improved vision and audio capabilities.\n",
    "\n",
    "The expanded vision/audio capabilities mean that it can be used for document parsing, by treating each page as an image and performing document extraction. We support using GPT-4o natively in LlamaParse for document parsing. The notebook below walks you through an example of using GPT-4o over the Tesla impact report.\n",
    "\n",
    "**NOTE**: The pricing for LlamaParse + gpt4o is an order more expensive than using LlamaParse by default. Currently, every page parsed with gpt4o counts for 10 pages in the LlamaParse usage tracker.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86b173ac-9fce-4813-bdf1-6dd7d93a491d",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ecc5eba5-96ce-4db7-bba1-f9ece33e681c",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b805592b-d1a5-4cd2-b916-348f66ca7941",
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = \"<LLAMA_CLOUD_API_KEY>\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e73e3c4-9e09-4cba-805f-326c82be812d",
   "metadata": {},
   "source": [
    "### Use LlamaParse with `gpt4o_mode=True`\n",
    "\n",
    "By turning on gpt4o, we use GPT-4o multimodal capabilities to do document parsing per page instead of the LlamaParse default pipeline.\n",
    "\n",
    "We load a snippet of the [2019 Tesla impact report](https://www.tesla.com/ns_videos/2019-tesla-impact-report.pdf). **NOTE**: The report is 57 pages, but will count for 570 pages in LlamaParse due to GPT-4o usage (which is approximately $1.71 USD).\n",
    "\n",
    "You can optionally choose to provide a `gpt4o_api_key`. If you do this, then we will use your API key to make GPT-4o calls, and your LlamaParse usage will be counted as if `gpt4o_mode` was not turned on (each page will be counted as a page instead of 10 pages). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aaa2ec5d-f27c-4262-80bf-e57daacff182",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-05-21 00:10:32--  https://www.dropbox.com/scl/fi/vu6w1dsfo5eddydz13ssm/2019-tesla-impact-report-15.pdf?rlkey=ik8lfqbg2p1ervss4qqt3xose&st=70j04z8j&dl=1\n",
      "Resolving www.dropbox.com (www.dropbox.com)... 2620:100:6057:18::a27d:d12, 162.125.13.18\n",
      "Connecting to www.dropbox.com (www.dropbox.com)|2620:100:6057:18::a27d:d12|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: https://uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com/cd/0/inline/CTTnZs8U4V1GtUCNxoB7INwmLq2yU97Q6QbWS6uVnb_XdHe368GrqF0zLDEKTnpc-x7utwNUUpMvWjLyrujrqNVrbGKTKa6hwHu5BxYPA2zXYrzdAEZyeve274xpHZKFywQ/file?dl=1# [following]\n",
      "--2024-05-21 00:10:33--  https://uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com/cd/0/inline/CTTnZs8U4V1GtUCNxoB7INwmLq2yU97Q6QbWS6uVnb_XdHe368GrqF0zLDEKTnpc-x7utwNUUpMvWjLyrujrqNVrbGKTKa6hwHu5BxYPA2zXYrzdAEZyeve274xpHZKFywQ/file?dl=1\n",
      "Resolving uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com (uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com)... 2620:100:6057:15::a27d:d0f, 162.125.13.15\n",
      "Connecting to uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com (uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com)|2620:100:6057:15::a27d:d0f|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: /cd/0/inline2/CTSaARDHbxvyEEgefshmsHLbuXkgV1Rmr-ItVhk5lPuZXkLlNnZMZWCF9YF5j4t2lLs4VurFW2VI1Q4A6ZFi8D2RXJmUG3wdgJhR6qSaBpwRZxjB_vk8qkJb8h1jRDaL7ATK6XYTHncab_aoPWzB62vrZ9yXUM0Mr-EdCX1k-hMbzXLV2dorA71IuFPY8ICkTemRWaG6VhBd3bV0C5AkMsAqy90w6Kez1ySFO06UkrxLSmkCaKdFgVoLcUVO2PLv4rGv6AuZOF_kqwsHdh82J9DQU4PMMyg-f5ChSGGSCKgmUfTBE2qP1eISP-GXSB91yWwMf-7rxGtM8MpDp-AL5jxYZxhZcmZn1cU8Or_8OOZrxg/file?dl=1 [following]\n",
      "--2024-05-21 00:10:33--  https://uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com/cd/0/inline2/CTSaARDHbxvyEEgefshmsHLbuXkgV1Rmr-ItVhk5lPuZXkLlNnZMZWCF9YF5j4t2lLs4VurFW2VI1Q4A6ZFi8D2RXJmUG3wdgJhR6qSaBpwRZxjB_vk8qkJb8h1jRDaL7ATK6XYTHncab_aoPWzB62vrZ9yXUM0Mr-EdCX1k-hMbzXLV2dorA71IuFPY8ICkTemRWaG6VhBd3bV0C5AkMsAqy90w6Kez1ySFO06UkrxLSmkCaKdFgVoLcUVO2PLv4rGv6AuZOF_kqwsHdh82J9DQU4PMMyg-f5ChSGGSCKgmUfTBE2qP1eISP-GXSB91yWwMf-7rxGtM8MpDp-AL5jxYZxhZcmZn1cU8Or_8OOZrxg/file?dl=1\n",
      "Reusing existing connection to [uc872df1ff4ea2fecd3d024fa97a.dl.dropboxusercontent.com]:443.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 26199694 (25M) [application/binary]\n",
      "Saving to: ‘2019-tesla-impact-report-15.pdf’\n",
      "\n",
      "2019-tesla-impact-r 100%[===================>]  24.99M  30.5MB/s    in 0.8s    \n",
      "\n",
      "2024-05-21 00:10:35 (30.5 MB/s) - ‘2019-tesla-impact-report-15.pdf’ saved [26199694/26199694]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!wget \"https://www.dropbox.com/scl/fi/vu6w1dsfo5eddydz13ssm/2019-tesla-impact-report-15.pdf?rlkey=ik8lfqbg2p1ervss4qqt3xose&st=70j04z8j&dl=1\" -O \"2019-tesla-impact-report-15.pdf\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f46991c1-031b-461f-b9a6-9237a821f4c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_cloud_services import LlamaParse\n",
    "\n",
    "parser_gpt4o = LlamaParse(\n",
    "    result_type=\"markdown\",\n",
    "    # api_key=api_key,\n",
    "    gpt4o_mode=True,\n",
    "    split_by_page=True,\n",
    "    # gpt4o_api_key=\"<gpt4o_api_key>\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1136ba82-074b-489d-9b0a-d609ccbf02b6",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id bf7d4619-3e26-479d-80e9-25702186ef32\n",
      "."
     ]
    }
   ],
   "source": [
    "documents_gpt4o = parser_gpt4o.load_data(\"./2019-tesla-impact-report-15.pdf\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e65c54f-3e4c-4c78-b1e8-a55ebeba1f24",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# Mission & Tesla Ecosystem\n",
      "\n",
      "Climate change is reaching alarming levels in large part due to emissions from burning fossil fuels for transportation and electricity generation. In 2016, carbon dioxide (CO2) concentration levels in the atmosphere exceeded the 400 parts per million threshold on a sustained basis - a level that climate scientists believe will have a catastrophic impact on the environment. Worse, annual global CO2 emissions continue to increase and have approximately doubled over the past 50 years to over 43 gigatons in 2019. The world’s current path is unwise and unsustainable.\n",
      "\n",
      "The world cannot reduce CO2 emissions without addressing both energy generation and consumption. And the world cannot address its energy habits without first directly reducing emissions in the transportation and energy sectors. We are focused on creating a complete energy and transportation ecosystem from solar generation and energy storage to all-electric vehicles that produce zero tailpipe emissions.\n",
      "\n",
      "Since the onset of shelter-in-place orders and travel restrictions due to COVID-19, we have seen dramatic increases in air quality across the planet, as well as projections for CO2 emissions to drop in excess of 4% in 2020 compared to pre-COVID-19 levels, according to researchers. Because these improvements in air quality and reductions in CO2 are a result of a global economic disruption and not due to systemic changes in how we produce and consume energy, they are not expected to be sustained absent intervention. However, these changes have shown us the positive impacts of reduced pollution in a very short period of time. At Tesla, we believe that we all have an unprecedented opportunity to learn from this disruption and accelerate the deployment of clean energy solutions as part of a recovery for all economies throughout the world, and we will actively continue to advocate for the realization of these long-term changes.\n",
      "\n",
      "| Global Greenhouse Gas (GHG) Emissions by Economic Sector |\n",
      "|----------------------------------------------------------|\n",
      "| ![Pie Chart](image_url)                                  |\n",
      "\n",
      "| Sector                                      | Percentage |\n",
      "|---------------------------------------------|------------|\n",
      "| Electricity & Heat Production*              | 31%        |\n",
      "| Agriculture, Forestry & Other Land Use      | 20%        |\n",
      "| Industry                                    | 18%        |\n",
      "| Transportation*                             | 16%        |\n",
      "| Other Energy                                | 9%         |\n",
      "| Buildings                                   | 6%         |\n",
      "\n",
      "*Tesla-related sectors. Source: World Resources Institute\n",
      "\n",
      "According to the Global Carbon project, when fully tallied, total carbon emissions from 2019 are expected to hit another record high of over 43 gigatons for the year. Energy use through electricity and heat production (31%) and transportation (16%) are significant drivers of these GHG emissions.\n"
     ]
    }
   ],
   "source": [
    "print(documents_gpt4o[3].get_content())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d62cbb62-37ea-4370-9411-d979aa3a627e",
   "metadata": {},
   "source": [
    "## Build RAG pipeline over the Parsed Report\n",
    "\n",
    "We now try building a RAG pipeline over this parsed report. It's not a lot of text, but we split it into chunks and load it into a simple in-memory vector store.\n",
    "\n",
    "We ask a question over the parsed markdown table and get back the right answer! We also ask a question over the text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d8b7c3ad-2147-448c-bcbe-3e6fcd8d5361",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex\n",
    "\n",
    "vector_index = VectorStoreIndex(documents_gpt4o)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8013351a-180d-4947-9f81-513042175c19",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = vector_index.as_query_engine(similarity_top_k=6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "795dc5c4-e122-4ff3-94d2-747fa51d5add",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\n",
    "    \"What are the greenhouse emissions for agriculture and transportation?\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39d2e6bd-3316-49b5-9a5d-5b4b95343e5a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Agriculture accounts for 20% of global greenhouse gas emissions, while transportation contributes 16% of these emissions.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9beb5cd4-4041-48c7-b22b-de5540f92a6d",
   "metadata": {},
   "source": [
    "Let's also try asking a question over another piece of the text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "543c8b63-5cd1-47a1-a8a1-81abbfd3e52b",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\n",
    "    \"How does the EPA range of Teslas compare with other vehicles? Give details\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e739eabf-732b-4f59-9628-972c4bf6c857",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The EPA range of Tesla vehicles varies across different models. The Model 3 Standard Range Plus (SR+) achieves an EPA range of 4.8 miles/kWh, making it the most efficient electric vehicle in production. The Model Y all-wheel drive (AWD) achieves 4.1 miles/kWh, which positions it as the most efficient electric SUV produced to date. The energy efficiency of Tesla vehicles is highlighted by these EPA range figures, showcasing their advancements in powertrain efficiency compared to other electric vehicles on the market.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "04b05c53-1a81-41a7-97f2-98a960211957",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# Reducing Carbon Footprint Even Further\n",
      "## Improving Powertrain Efficiency\n",
      "\n",
      "Tesla vehicles are known to have the highest energy efficiency of any EV built to date. In the early days of Model S production, we were able to achieve energy efficiency of 3.1 EPA miles / kWh. Today, our most efficient Model 3 Standard Range Plus (SR+) achieves an EPA range of 4.8 miles / kWh, more than any EV in production. Model Y all-wheel drive (AWD) achieves 4.1 EPA miles / kWh, which makes it the most efficient electric SUV produced to date.\n",
      "\n",
      "The energy efficiency of Tesla vehicles will continue to improve further over time as we continue to improve our technology and powertrain efficiency. It is also reasonable to assume that our high-mileage products, such as our future Tesla Robotaxis, will be designed for maximum energy efficiency as handling, acceleration, and top speed become less relevant. That way, we will minimize cost for our customers as well as reduce the carbon footprint per mile driven.\n",
      "\n",
      "### Average Lifecycle Emissions in U.S. (gCO2e/mi)\n",
      "\n",
      "| Vehicle Type                          | Manufacturing Phase | Use Phase | Total Emissions |\n",
      "|---------------------------------------|---------------------|-----------|-----------------|\n",
      "| Avg. Mid-Size Premium ICE             |                     |           |                 |\n",
      "| Model 3 Personal Use (grid charged)   |                     |           |                 |\n",
      "| Model 3 Ridesharing Use (grid charged)|                     |           |                 |\n",
      "| Model 3 Personal Use (solar charged)  |                     |           |                 |\n",
      "| Model 3 Ridesharing Use (solar charged)|                    |           |                 |\n",
      "\n",
      "*Note: The chart shows that the emissions depend on powertrain efficiency.*\n",
      "\n",
      "### Energy Efficiency EPA range in miles/kWh\n",
      "\n",
      "| Vehicle Model       | EPA Range (miles/kWh) |\n",
      "|---------------------|-----------------------|\n",
      "| Model 3 SR+         | 4.8                   |\n",
      "| Model 3 AWD         |                       |\n",
      "| Model Y AWD         |                       |\n",
      "| Hyundai Kona        |                       |\n",
      "| Chevy Bolt          |                       |\n",
      "| Model S LR+         |                       |\n",
      "| Nissan Leaf         |                       |\n",
      "| Model X LR+         |                       |\n",
      "| Jaguar iPace        |                       |\n",
      "| Mercedes EQC*       |                       |\n",
      "| Ford Mach E AWD     |                       |\n",
      "| Audi e-tron         |                       |\n",
      "| Porsche Taycan      |                       |\n",
      "\n",
      "*Tesla estimate. Source: OEM websites*\n"
     ]
    }
   ],
   "source": [
    "print(response.source_nodes[0].get_content())"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_parse",
   "language": "python",
   "name": "llama_parse"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
